3 - Clustering Metrics

Author

CDN team

Published

March 7, 2024

Introduction

In scRNAseq, how do we know our clusters are good? Most of us go through a laundry list of tasks: * Have we found expected populations? + Do gene markers or sets match our expectation? + Have we found our rare cell type of interest? * Are there any doublets? * Are there any low-quality clusters? * Is there a batch effect?

And much of this is accomplished by checking qualitative aspects of our clusters * Are the clusters separated on the UMAP? * Do heatmaps show overlap of genes across clusters?

And the qualitative aspect of clustering causes us to cluster, annotate, cluster again, annotate again, and cluster again iteratively until we meet our expectations. Only rarely are quantitative methods used to determine cluster quality.

In this notebook, I will introduce some intuitive quantitative clustering metrics that will help quantitatively describe how well formed our clusters are and where there might be batch effects.

Useful Resources

Key Takeaways

  • Quantitative clustering metrics can help us assess and optimize clustering in single-cell data
    • sample diversity
    • silhouette width
    • cluster size
    • distance between clusters

Libraries

Installation

if (!requireNamespace("tidyverse", quietly = TRUE))
    install.packages('tidyverse')
if (!requireNamespace("Seurat", quietly = TRUE))
    install.packages('Seurat')
if (!requireNamespace("plotly", quietly = TRUE))
    install.packages('plotly')
if (!requireNamespace("colorBlindness", quietly = TRUE))
    install.packages('colorBlindness')
if (!requireNamespace("cluster", quietly = TRUE))
    install.packages('cluster') # The cluster package provides tools to calculate clustering metrics, here used for silhouette analysis
if (!requireNamespace("RColorBrewer", quietly = TRUE))
    install.packages('RColorBrewer')
if (!requireNamespace("scales", quietly = TRUE))
    install.packages('scales')
if (!requireNamespace("viridis", quietly = TRUE))
    install.packages('viridis')
if (!requireNamespace("ARBOL", quietly = TRUE))
    devtools::install_github('jo-m-lab/ARBOL') # ARBOL is used to plot clusters.
if (!requireNamespace("presto", quietly = TRUE))
    devtools::install_github('immunogenomics/presto') # presto is used to speed up Wilcoxon tests for marker gene calculation in Seurat
if (!requireNamespace("vegan", quietly = TRUE))
    BiocManager::install('vegan') # vegan is a community ecological analysis package that provides many tools for dissimilarity, ordination, and diversity analysis, we will use it for diversity analysis here

Loading

suppressPackageStartupMessages({
  library(Seurat)
  library(tidyverse)
  library(ARBOL)
  library(plotly)
  library(colorBlindness)
  library(scales)
  library(RColorBrewer)
  library(viridis)
  library(vegan)
  library(cluster)
})
set.seed(687)

Load data

Load Seurat and replace gene symbols

srobj <- readRDS('/Users/kylekimler/Downloads/d8e35450-de43-451a-9979-276eac688bce.rds')
genes <- read_csv('/Users/kylekimler/CDN/workshops/workshop1/data/cov_flu_gene_names_table.csv') # Load a provided gene conversion table to convert ENSG to readable gene symbols

# Need to remake seurat object
mtx <- srobj@assays$RNA@data
rownames(mtx) <- genes[match(row.names(mtx),genes$index),]$feature_name

srobj <- CreateSeuratObject(counts = mtx, meta.data = srobj@meta.data)

srobj
An object of class Seurat 
33234 features across 59572 samples within 1 assay 
Active assay: RNA (33234 features, 0 variable features)
 1 layer present: counts

Setup

Generate a color palette for plotting

# https://www.datanovia.com/en/blog/easy-way-to-expand-color-palettes-in-r/
# nb.cols <- length(unique(se$Celltype))
# mycolors <- colorRampPalette(paletteMartin)(nb.cols)
pal <- paletteMartin
names(pal) <- sort(unique(srobj$Celltype))

Seurat pre-processing for cluster visualization (UMAP)

srobj <- srobj %>%
    NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(
        method = "vst",
        nfeatures = 3000,
        verbose = FALSE) %>%
    ScaleData(verbose = FALSE) %>%
    RunPCA(verbose = FALSE) %>%
    FindNeighbors %>%
    FindClusters(resolution = 0.5) %>%
    RunUMAP(dims = 1:30, verbose = FALSE, n.components=3L)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 59572
Number of edges: 1894745

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9287
Number of communities: 21
Elapsed time: 12 seconds

Viewing clusters

Typically in scRNA analysis, clusters are viewed by coloring cells on a dimension-reduced latent space of gene expression, like a tSNE or a UMAP. In the dataset we’ve downloaded, we have author annotations and we have just calculated clusters with Seurat as well.

By plotting many UMAPs per sample or per group, we can start to understand how clusters behave across samples.

d1 <- DimPlot(srobj, 
        reduction='umap', 
        group.by='Celltype',
        cols = pal)

d2 <- DimPlot(srobj, 
        reduction='umap', 
        group.by='seurat_clusters')

d3 <- DimPlot(srobj, 
        reduction='umap', 
        group.by='Sample ID') 

d1 | d2 | d3

DimPlot(srobj, 
        reduction='umap', 
        group.by='seurat_clusters', 
        split.by='Sample ID',
        ncol=4) & theme_minimal()

For example, we might see that Flu patients 1 and 5 have clusters that are unique to them. But that doesn’t tell us much about the clusters themselves. The authors have annotated them as Erythrocytes. We might also notice there are many unique clusters corresponding to classical monocytes.

These clusters are often validated by plotting heatmaps, dotplots, or violin plots of genes that are most important to each cluster, found by 1 v all differential expression.

Idents(srobj) <- srobj@meta.data$seurat_clusters
celltype_markers <- FindAllMarkers(srobj, 
                        only.pos=TRUE, 
                        logfc.threshold=0.25,
                        min.diff.pct=0.05,
                        max.cells.per.ident = 200
                        )

top_cluster_features <- celltype_markers %>% group_by(cluster) %>% 
                                        filter(p_val_adj < 0.01) %>%
                                        slice_max(avg_log2FC, n=10)

hm <- DoHeatmap(srobj, features=top_cluster_features$gene, raster = TRUE, )

hm

And there are quite a few methods for displaying genes across clusters detailed in Seurat vignettes

Slightly beyond the norm: 3D

From the UMAPs, we see that some clusters and cell types are somewhat well defined but as usual there are some fuzzy boundaries. If we view a third dimension, we can see some additional separation inside these clusters.

emb <- Seurat::Embeddings(srobj,reduction='umap')
emb <- emb %>% as.data.frame %>% rownames_to_column('CellID') %>% 
left_join(srobj@meta.data %>% rownames_to_column("CellID"))

suppressMessages({
p <- plot_ly(emb, type='scatter3d', 
                color = ~seurat_clusters, 
                size = 0.5,
                x = ~umap_1, 
                y = ~umap_2, 
                z = ~umap_3, 
                cols = pal)
p
    })

But we could look at many more UMAP dimensions and see similar differences.

Quantifying cluster attributes

Even with 3d we don’t have a way to quantify how useful our clusters are as labels in our dataset. Firstly, with a UMAP or tSNE, we can’t see how well represented samples are across clusters. Some people build cluster-composition bar graphs for this, but these can be difficult to compare, and they don’t show us how large the clusters are. One solution is to make a plot of cluster metrics per cluster. Let’s start with cell counts and sample diversity.

Cluster diversity

In ecological analysis, diversity indices quantify species richness in an ecosystem. Here, we flip the idea on its head, calculating the diversity of samples in a specific cluster of scRNA data. Diversity indices are analogous to information theory metrics that calculate the degree of uncertainty when predicting the next bit in information flow. If diversity is high, then uncertainty is high. If diversity is low, then we can be sure when we pick a random cell from that cluster, it will come from the dominant sample.

diversityPerGroup <- function(df, species, group, diversity_metric = 'simpson') {
  # Convert strings to symbols for curly-curly operator
  species_sym <- rlang::sym(species)
  group_sym <- rlang::sym(group)
  # Count groups per species directly using curly-curly
  tierNcount <- df %>%
    group_by({{species_sym}}) %>%
    count({{group_sym}}, name = "n") %>% ungroup
  # Pivot table to allow vegan::diversity call
  tierNwide <- tierNcount %>%
    pivot_wider(names_from = {{group_sym}}, values_from = n, values_fill = list(n = 0))
  # Use rownames of species for the diversity function, which needs a dataframe
  tierNwide_df <- as.data.frame(tierNwide)
  # species column must be first
  tierNwide_df <- tierNwide_df %>% select({{species}}, everything())
  rownames(tierNwide_df) <- tierNwide_df[, 1]
  tierNwide_df <- tierNwide_df[, -1]
  # Calculate diversity
  diversity_values <- vegan::diversity(tierNwide_df, index = diversity_metric)
  # Prepare result as a dataframe
  result <- data.frame(
    species = rownames(tierNwide_df),
    diversity = diversity_values,
    row.names = NULL
  )
  # Rename diversity column
  names(result)[1] <- species
  names(result)[2] <- sprintf('%s_diversity', group)
  return(result)
}

# Calculate simpson's diversity per cluster
clusterMetrics <- diversityPerGroup(srobj@meta.data,
                        species = 'seurat_clusters',
                        group = 'Sample ID',
                        diversity_metric = 'simpson')

# Calculate number of cells per cluster and join to metrics table
clusterMetrics <- clusterMetrics %>% left_join(srobj@meta.data %>% count(seurat_clusters))

# clusterMetrics

# p1 <- ggplot(clusterMetrics, aes(x = Celltype, y = n)) +
#   geom_bar(stat = "identity", fill = 'black') +
#   scale_y_log10() +
#   labs(x = "Cell Type", y = "Cell Number (log scale)") +
#   theme_minimal() +
#   theme(axis.text.x = element_text(angle = 45, hjust = 1))
clusterMetrics$seurat_clusters <- as.numeric(clusterMetrics$seurat_clusters)

lollipop <- ggplot(clusterMetrics, aes(x = seurat_clusters, y = n)) +
  geom_segment(aes(x = seurat_clusters, xend = seurat_clusters, y = 0, yend = n), size = 1.5, color = 'grey80') + # Draw lines for lollipops
  geom_point(aes(colour = `Sample ID_diversity`), size = 5) + # Add colored lollipop 'heads', coloring by 'Sample ID_diversity'
  scale_y_continuous(trans = 'log', name = "Log-scaled n", breaks = c(seq(1,10),seq(20,100,10), seq(200,1000,100), seq(2000,10000,1000))) + # Beautiful log breaks
  scale_x_continuous(breaks = seq(0,20)) + 
  scale_colour_viridis(option = "C", name = "Sample ID Diversity", direction = 1) + # Colorblind-friendly, vibrant color palette
  theme_minimal(base_size = 10) +
  theme(legend.position = "bottom",,
        axis.text = element_text(size = 12), 
        axis.title = element_text(size = 14), 
        title = element_text(size = 16)) +
  labs(x = "Seurat Clusters", y = "cluster size (log-scaled)", title = "Cluster Metrics") +
  guides(colour = guide_colourbar(title.position = "top", title.hjust = 0.5))

lollipop

p2 <- ggplot(clusterMetrics, aes(y=seurat_clusters, fill=`Sample ID_diversity`, x = 1, label = n)) +
  geom_tile(colour = "white") +
  geom_text(nudge_x = 1.5, size = 5) +
  geom_text(aes(label = signif(`Sample ID_diversity`, 2)),size = 5) +
  scale_fill_distiller(palette = "Spectral", limits = c(0,1)) + theme_classic() +
  coord_fixed(ratio = 0.25) + 
  expand_limits(x = c(0.5,3)) +
  labs(x = "") +
  theme(axis.text.y = element_text(hjust = 1, vjust = 0.5, size = 16),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y = element_text(size = 20),
        legend.key.size = unit(1, 'cm'),
        legend.title = element_text(size=10), 
        legend.text = element_text(size=10)
  )
p2

So in this paper it looks like our clusters are in fact well represented among every sample. None of them are particularly small, but the classical Monocyte cluster contains > 15000 cells. There may be some additional clusters hiding there.

Silhoeutte Analysis

In addition to cell numbers and sample representation, it can be useful to get an idea of how tightly clustered cells are in each cluster, and how distant each cluster is to the others. This way we can quantify how fuzzy the borders are between clusters. Many clustering metrics exist to answer this question (Lim and Qiu 2024) - one popular metric is the average silhouette distance.

In silhouette analysis, for each cell, the average distance between cells in the same cluster is subtracted from the minimum average distance between that cell and cells from the other clusters, minimized across clusters. This value is then divided by the maximum of the two values to scale it to (-1,1) across cells.

In other words, s(i) = \frac{b(i) - a(i)}{\max\{a(i), b(i)\}}

where:

  • (a(i)) is the average distance from the (i^{th}) point to the other points in the same cluster,
  • (b(i)) is the smallest average distance from the (i^{th}) point to points in a different cluster, minimized over all clusters,
  • (s(i)) is the silhouette score for the (i^{th}) point, ranging from -1 to 1.
seurat_clusters <- srobj@meta.data$seurat_clusters
pca_embeddings <- Embeddings(srobj, reduction = 'pca')

# Calculate silhouette widths
# sil_widths <- silhouette(x = cluster_assignments, dist = dist(pca_embeddings))
sil_scores <- silhouette(x = as.numeric(seurat_clusters), dist = dist(pca_embeddings))

# Extract silhouette scores
silhouette_data <- as.data.frame(sil_scores)
# Recover cell type names
silhouette_data$seurat_clusters <- as.character(seurat_clusters)

row.names(silhouette_data) <- row.names(pca_embeddings)

silhouette_arranged <- silhouette_data %>% group_by(seurat_clusters) %>% arrange(-sil_width)
overall_average <- silhouette_arranged %>% ungroup %>% summarize(ave = as.numeric(mean(sil_width))) %>% pull(ave)

full_silhouettes_plot <- ggplot(silhouette_arranged, aes(y = seurat_clusters, x = sil_width, fill = seurat_clusters, group = seurat_clusters)) +
    geom_bar(stat = "identity", position = 'dodge2') +
    geom_vline(xintercept = overall_average,
               color = 'red',
               linetype = 'longdash') +
    theme_minimal() +
    labs(title = "Silhouette Analysis",
        y = "Cluster",
        x = "Silhouette width",
        fill = "Cluster") +
    theme(axis.text.y = element_text(hjust = 1, vjust = 0.5, size = 20),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y = element_text(size = 20),
        legend.position = "None")

full_silhouettes_plot

srobj$CellID <- row.names(srobj@meta.data)

sil_ids <- silhouette_data %>% rownames_to_column('CellID') %>% left_join(srobj@meta.data)

srobj <- AddMetaData(srobj, sil_ids)

FeaturePlot(srobj, feature = "sil_width") + ggtitle('Silhouette width') + scale_color_viridis_c(limits = c(-1,1)) | d2

The red dotted line here represents the overall average silhouette - in clustering optimization this value can be optimized for to find the “best possible” resolution for a dataset. But sub-clustering can also increase the overall average silhouette distance in ways that changing the resolution cannot. Here, we can inspect each cluster individually and make these calls cluster by cluster.

Cluster taxonomy

Taking the concept of distances between clusters one step further, our lab wrote a package, ARBOL, for calculating and plotting distances between clusters in scRNAseq data (Zheng et al. 2023).

With ARBOL, we can build a dendrogram of cluster centroid distances based on gene expression or some dimensionality reduction in the data. This can be useful for plotting cluster metrics

# ARBOL requires 3 metadata entries
srobj@meta.data$tierNident <- paste0('Cluster ',srobj@meta.data$seurat_clusters)
srobj@meta.data$CellID <- row.names(srobj@meta.data)
srobj@meta.data$sample <- srobj@meta.data$`Sample ID`

srobj <- ARBOLcentroidTaxonomy(srobj, 
                               tree_reduction='pca',
                               categories = c('Celltype')
                               )
Bootstrap (r = 0.48)... Done.
Bootstrap (r = 0.6)... Done.
Bootstrap (r = 0.68)... Done.
Bootstrap (r = 0.8)... Done.
Bootstrap (r = 0.88)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.08)... Done.
Bootstrap (r = 1.2)... Done.
Bootstrap (r = 1.28)... Done.
Bootstrap (r = 1.4)... Done.
ggraph(srobj@misc$tax_ggraph, layout = 'dendrogram', height=plotHeight) +
  geom_edge_elbow() +
  geom_node_text(aes(filter = leaf, label = name, color = name), nudge_y=8,vjust=0.5,hjust=0,size=8) +
  geom_node_text(aes(filter = leaf, label = n),color='grey30',nudge_y=1,vjust=0.5,hjust=0,size=7) + 
  theme_void() +
  new_scale('color') +
  geom_node_point(aes(filter = leaf, color=sample_diversity),size=4,shape='square') + 
  scale_color_gradient(low='grey90',high='grey10',limits=c(0,1)) +
  coord_flip() + scale_y_reverse() +
  expand_limits(y=-20)

srobj@meta.data$tierNident <- srobj@meta.data$Celltype

srobj <- ARBOLcentroidTaxonomy(srobj, 
                               tree_reduction='pca',
                               categories = c('Celltype')
                               )
Bootstrap (r = 0.48)... Done.
Bootstrap (r = 0.6)... Done.
Bootstrap (r = 0.68)... Done.
Bootstrap (r = 0.8)... Done.
Bootstrap (r = 0.88)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.08)... Done.
Bootstrap (r = 1.2)... Done.
Bootstrap (r = 1.28)... Done.
Bootstrap (r = 1.4)... Done.
ggraph(srobj@misc$tax_ggraph, layout = 'dendrogram', height=plotHeight) +
  geom_edge_elbow() +
  geom_node_text(aes(filter = leaf, label = name, color = name), nudge_y=10,vjust=0.5,hjust=0,size=8) +
  geom_node_text(aes(filter = leaf, label = n),color='grey30',nudge_y=1,vjust=0.5,hjust=0,size=7) + 
  theme_void() +
  new_scale('color') +
  geom_node_point(aes(filter = leaf, color=sample_diversity),size=4,shape='square') + 
  scale_color_gradient(low='grey90',high='grey10',limits=c(0,1)) +
  coord_flip() + scale_y_reverse() +
  expand_limits(y=-40)

Session Info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] viridis_0.6.5        viridisLite_0.4.2    RColorBrewer_1.1-3   scales_1.3.0         colorBlindness_0.1.9 plotly_4.10.4        ARBOL_4.0.0          dendextend_1.17.1    cluster_2.1.6        strex_2.0.0          glmGamPoi_1.14.3     ggnewscale_0.4.10    scatterpie_0.2.1     ggalluvial_0.12.5    phyloseq_1.46.0      ape_5.7-1            data.tree_1.1.0      ggraph_2.1.0         tidygraph_1.3.1      ggrepel_0.9.5        ggpubr_0.6.0         ggfortify_0.4.16     Matrix_1.6-5         reshape2_1.4.4       vegan_2.6-4          lattice_0.22-5       permute_0.9-7        harmony_1.2.0        Rcpp_1.0.12          pvclust_2.2-0        limma_3.58.1         future_1.33.1        data.table_1.15.2    lubridate_1.9.3      forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2          readr_2.1.5          tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.0        tidyverse_2.0.0      Seurat_5.0.1         SeuratObject_5.0.1   sp_2.1-3             knitr_1.45          

loaded via a namespace (and not attached):
  [1] matrixStats_1.2.0           spatstat.sparse_3.0-3       bitops_1.0-7                httr_1.4.7                  tools_4.3.1                 sctransform_0.4.1           backports_1.4.1             utf8_1.2.4                  R6_2.5.1                    lazyeval_0.2.2              uwot_0.1.16                 mgcv_1.9-1                  rhdf5filters_1.14.1         withr_3.0.0                 gridExtra_2.3               progressr_0.14.0            cli_3.6.2                   Biobase_2.62.0              spatstat.explore_3.2-6      fastDummies_1.7.3           labeling_0.4.3              spatstat.data_3.0-4         ggridges_0.5.6              pbapply_1.7-2               parallelly_1.37.1           generics_0.1.3              gridGraphics_0.5-1          crosstalk_1.2.1             ica_1.0-3                   spatstat.random_3.2-2       vroom_1.6.5                 car_3.1-2                   biomformat_1.30.0           fansi_1.0.6                 S4Vectors_0.40.2            abind_1.4-5                 lifecycle_1.0.4             yaml_2.3.8                  carData_3.0-5               SummarizedExperiment_1.32.0 rhdf5_2.46.1                SparseArray_1.2.4          
 [43] Rtsne_0.17                  grid_4.3.1                  promises_1.2.1              crayon_1.5.2                miniUI_0.1.1.1              cowplot_1.1.3               pillar_1.9.0                GenomicRanges_1.54.1        future.apply_1.11.1         codetools_0.2-19            leiden_0.4.3.1              glue_1.7.0                  ggfun_0.1.4                 vctrs_0.6.5                 png_0.1-8                   spam_2.10-0                 gtable_0.3.4                xfun_0.42                   S4Arrays_1.2.0              mime_0.12                   survival_3.5-8              iterators_1.0.14            statmod_1.5.0               ellipsis_0.3.2              fitdistrplus_1.1-11         ROCR_1.0-11                 nlme_3.1-164                bit64_4.0.5                 RcppAnnoy_0.0.22            GenomeInfoDb_1.38.6         irlba_2.3.5.1               KernSmooth_2.23-22          colorspace_2.1-0            BiocGenerics_0.48.1         ade4_1.7-22                 tidyselect_1.2.0            bit_4.0.5                   compiler_4.3.1              DelayedArray_0.28.0         lmtest_0.9-40               digest_0.6.34               goftest_1.2-3              
 [85] presto_1.0.0                spatstat.utils_3.0-4        rmarkdown_2.25              XVector_0.42.0              htmltools_0.5.7             pkgconfig_2.0.3             MatrixGenerics_1.14.0       fastmap_1.1.1               rlang_1.1.3                 htmlwidgets_1.6.4           shiny_1.8.0                 farver_2.1.1                zoo_1.8-12                  jsonlite_1.8.8              RCurl_1.98-1.14             magrittr_2.0.3              GenomeInfoDbData_1.2.11     dotCall64_1.1-1             patchwork_1.2.0             Rhdf5lib_1.24.2             munsell_0.5.0               reticulate_1.35.0           stringi_1.8.3               zlibbioc_1.48.0             MASS_7.3-60.0.1             plyr_1.8.9                  parallel_4.3.1              listenv_0.9.1               deldir_2.0-2                Biostrings_2.70.2           graphlayouts_1.1.0          splines_4.3.1               tensor_1.5                  multtest_2.58.0             hms_1.1.3                   igraph_2.0.2                spatstat.geom_3.2-8         ggsignif_0.6.4              RcppHNSW_0.6.0              stats4_4.3.1                evaluate_0.23               tzdb_0.4.0                 
[127] foreach_1.5.2               tweenr_2.0.3                httpuv_1.6.14               RANN_2.6.1                  polyclip_1.10-6             scattermore_1.2             ggforce_0.4.2               broom_1.0.5                 xtable_1.8-4                RSpectra_0.16-1             rstatix_0.7.2               later_1.3.2                 IRanges_2.36.0              timechange_0.3.0            globals_0.16.2             

References

Lim, Hong Seo, and Peng Qiu. 2024. “Quantifying the Clusterness and Trajectoriness of Single-Cell RNA-Seq Data.” PLOS Computational Biology 20 (2): e1011866. https://doi.org/10.1371/journal.pcbi.1011866.
Zheng, Hengqi Betty, Benjamin A. Doran, Kyle Kimler, Alison Yu, Victor Tkachev, Veronika Niederlova, Kayla Cribbin, et al. 2023. “Concerted Changes in the Pediatric Single-Cell Intestinal Ecosystem Before and After Anti-TNF Blockade.” eLife 12 (November). https://doi.org/10.7554/eLife.91792.1.